Hierarchical aggregation of select network traffic statistics

ABSTRACT

Disclosed herein are systems and methods for the collection, aggregation, and processing of network traffic statistics for a plurality of network appliances in a wide area network. Select network traffic statistics can be collected and associated with a hierarchical string, and aggregated over time. In this way, only information that is likely to be relevant is gathered and maintained, allowing for the maintenance of select network traffic statistics for large-scale operations.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application is a Continuation of, and claims the prioritybenefit of, U.S. patent application Ser. No. 15/180,981 filed on Jun.13, 2016. The disclosure of the above-referenced application isincorporated herein by reference in its entirety for all purposes.

TECHNICAL FIELD

This disclosure relates generally to the collection, aggregation, andprocessing of network traffic statistics for a plurality of networkappliances.

BACKGROUND

The approaches described in this section could be pursued, but are notnecessarily approaches that have previously been conceived or pursued.Therefore, unless otherwise indicated, it should not be assumed that anyof the approaches described in this section qualify as prior art merelyby virtue of their inclusion in this section.

An increasing number of network appliances, physical and virtual, aredeployed in communication networks such as wide area networks (WAN). Foreach network appliance, it may be desirable to monitor attributes andstatistics of the data traffic handled by the device. For example,information can be collected regarding source IP addresses, destinationIP addresses, traffic type, port numbers, etc. for the traffic thatpasses through the network appliance. Typically this information iscollected for each data flow using industry standards such as NetFlowand IPFIX. The collected data is transported across the network to acollection engine, stored in a database, and can be utilized for runningqueries and generating reports regarding the network.

Since there can be any number of data flows processed by a networkappliance each minute (hundreds, thousands, or even millions), thisresults in a large volume of data that is collected each minute, foreach network appliance. As the number of network appliances in acommunication network increases, the amount of data generated canquickly become unmanageable. Moreover, transporting all of this dataacross the network from each network appliance to the collection enginecan be a significant burden, as well as storing and maintaining adatabase with all of the data. Further, it may take longer to run aquery and generate a report since the amount of data to be processed andanalyzed is so large.

Thus, there is a need for a more efficient mechanism for collecting andstoring network traffic statistics for a large number of networkappliances in a communication network.

SUMMARY

This summary is provided to introduce a selection of concepts in asimplified form that are further described in the Detailed Descriptionbelow. This summary is not intended to identify key features oressential features of the claimed subject matter, nor is it intended tobe used as an aid in determining the scope of the claimed subjectmatter.

In various exemplary methods of the present disclosure, a system foraggregating select network traffic statistics is disclosed. The systemcomprises a plurality of network appliances in a communication networkconfigured to collect a plurality of flow attributes for network trafficthrough each network appliance, build a plurality of hierarchicalstrings of network traffic flow attributes with extracted attributevalues of those flow attributes, extract at least one network metric forat least one network characteristic associated with each of theplurality of hierarchical strings, and aggregate the at least onenetwork metric for the at least one network characteristic over theplurality of flows, and transmit the aggregated information to a networkinformation collector in communication with each network appliance; andthe network information collector configured to receive the informationfrom each network appliance, and provide the information to a user on agraphical user display in response to the user running a query on thereceived information.

In other embodiments, a method for aggregating select network trafficstatistics for each of a plurality of network appliances connected in acommunication network is disclosed. The method for each flow from anetwork appliance, extracting an attribute value of a first flowattribute; for each flow from the network appliance, extracting anattribute value of a second flow attribute; building at least onehierarchical string with the extracted attribute values; extracting atleast one network metric for at least one network characteristicassociated with the at least one hierarchical string; aggregating the atleast one network metric for the at least one network characteristicover a plurality of flows; and transmitting the at least onehierarchical string and associated aggregated network metrics to anetwork information collector in communication with the networkappliance.

Other features, examples, and embodiments are described below.

BRIEF DESCRIPTION OF THE DRAWINGS

Embodiments are illustrated by way of example and not by limitation inthe figures of the accompanying drawings, in which like referencesindicate similar elements.

FIG. 1A depicts an exemplary system of the prior art.

FIG. 1B depicts an exemplary system within which the present disclosurecan be implemented.

FIG. 2 illustrates a block diagram of a network appliance, in anexemplary implementation of the disclosure.

FIG. 3 depicts an exemplary flow table at a network appliance.

FIG. 4A depicts an exemplary accumulating map at a network appliance.

FIG. 4B depicts exemplary information from a row of an accumulating map.

FIG. 5A depicts an exemplary sorting via bins for an accumulating map.

FIG. 5B depicts an exemplary eviction policy for an accumulating map.

FIG. 6 depicts an exemplary method for building a hierarchical string.

DETAILED DESCRIPTION

The following detailed description includes references to theaccompanying drawings, which form a part of the detailed description.The drawings show illustrations, in accordance with exemplaryembodiments. These exemplary embodiments, which are also referred toherein as “examples,” are described in enough detail to enable thoseskilled in the art to practice the present subject matter. Theembodiments can be combined, other embodiments can be utilized, orstructural, logical, and electrical changes can be made withoutdeparting from the scope of what is claimed. The following detaileddescription is therefore not to be taken in a limiting sense, and thescope is defined by the appended claims and their equivalents. In thisdocument, the terms “a” and “an” are used, as is common in patentdocuments, to include one or more than one. In this document, the term“or” is used to refer to a nonexclusive “or,” such that “A or B”includes “A but not B,” “B but not A,” and “A and B,” unless otherwiseindicated.

The embodiments disclosed herein may be implemented using a variety oftechnologies. For example, the methods described herein may beimplemented in software executing on a computer system containing one ormore computers, or in hardware utilizing either a combination ofmicroprocessors or other specially designed application-specificintegrated circuits (ASICs), programmable logic devices, or variouscombinations thereof. In particular, the methods described herein may beimplemented by a series of computer-executable instructions residing ona storage medium, such as a disk drive, or computer-readable medium.

The embodiments described herein relate to the collection, aggregation,and processing of network traffic statistics for a plurality of networkappliances.

FIG. 1A depicts an exemplary system 100 within which embodiments of theprior art are implemented. The system comprises a plurality of networkappliances 110 in communication with a flow information collector 120over one or more wired or wireless communication network(s) 160. Theflow information collector 120 is further in communication with one ormore flow database(s) 125, which in turn is in communication with areporting engine 140 that is accessible by a user 150.

Network appliance 110 collects information about network flows that areprocessed through the appliance and maintains flow records 112. Theseflow records are transmitted to the flow information collector 120 andmaintained in flow database 125. User 150 can access information fromthese flow records 112 via reporting engine 140.

FIG. 1B depicts an exemplary system 170 within which the presentdisclosure can be implemented. The system comprises a plurality ofnetwork appliances 110 in communication with a network informationcollector 180 over one or more wired or wireless communicationnetwork(s) 160. The network information collector 180 is further incommunication with one or more database(s) 130, which in turn is incommunication with a reporting engine 140 that is accessible by a user150. While network information collector 180, database(s) 130, andreporting engine 140 are depicted in the figure as separate, one or moreof these engines can be part of the same computing machine ordistributed across many computers.

In a wide area network, there can be multiple network appliancesdeployed in one or more geographic locations. Each network appliance 110comprises hardware and/or software elements configured to receive dataand optionally perform any type of processing, including but not limitedto, WAN optimization techniques to the data, before transmitting toanother appliance. In various embodiments, the network appliance 110 canbe configured as an additional router or gateway. If a network appliancehas multiple interfaces, it can be transparent on some interfaces, andact like a router/bridge on others. Alternatively, the network appliancecan be transparent on all interfaces, or appear as a router/bridge onall interfaces. In some embodiments, network traffic can be interceptedby another device and mirrored (copied) onto network appliance 110. Thenetwork appliance 110 may further be either physical or virtual. Avirtual network appliance can be in a virtual private cloud (not shown),managed by a cloud service provider, such as Amazon Web Services, orothers.

Network appliance 110 collects information about network flows that areprocessed through the appliance in flow records 112. From these flowrecords 112, network appliance 110 further generates an accumulating map114 containing select information from many flow records 112 aggregatedover a certain time period. The flow records 112 and accumulating map114 generated at network appliance 110 are discussed in further detailbelow with respect to FIGS. 3 and 4.

At certain time intervals, network appliance 110 transmits informationfrom the accumulating map 114 (and not flow records 112) to networkinformation collector 180 and maintains this information in one or moredatabase(s) 130. User 150 can access information from these accumulatingmaps via reporting engine 140, or in some instances user 150 can accessinformation from these accumulating maps directly from a networkappliance 110.

FIG. 2 illustrates a block diagram of a network appliance 110, in anexemplary implementation of the disclosure. The network appliance 110includes a processor 210, a memory 220, a WAN communication interface230, a LAN communication interface 240, and a database 250. A system bus280 links the processor 210, the memory 220, the WAN communicationinterface 230, the LAN communication interface 240, and the database250. Line 260 links the WAN communication interface 230 to anotherdevice, such as another appliance, router, or gateway, and line 270links the LAN communication interface 240 to a user computing device, orother networking device. While network appliance 110 is depicted in FIG.2 as having these exemplary components, the appliance may haveadditional or fewer components.

The database 250 comprises hardware and/or software elements configuredto store data in an organized format to allow the processor 210 tocreate, modify, and retrieve the data. The hardware and/or softwareelements of the database 250 may include storage devices, such as RAM,hard drives, optical drives, flash memory, and magnetic tape.

In some embodiments, some network appliances comprise identical hardwareand/or software elements. Alternatively, in other embodiments, somenetwork appliances may include hardware and/or software elementsproviding additional processing, communication, and storage capacity.

Each network appliance 110 can be in communication with at least oneother network appliance 110, whether in the same geographic location,different geographic location, private cloud network, customerdatacenter, or any other location. As understood by persons of ordinaryskill in the art, any type of network topology may be used. There can beone or more secure tunnels between one or more network appliances. Thesecure tunnel may be utilized with encryption (e.g., IPsec), accesscontrol lists (ACLs), compression (such as header and payloadcompression), fragmentation/coalescing optimizations and/or errordetection and correction provided by an appliance.

A network appliance 110 can further have a software program operating inthe background that tracks its activity and performance. For example,information about data flows that are processed by the network appliance110 can be collected. Any type of information about a flow can becollected, such as header information (source port, destination port,source address, destination address, protocol, etc.), packet count, bytecount, timestamp, traffic type, or any other flow attribute. Thisinformation can be stored in a flow table 300 at the network appliance110. Flow tables will be discussed in further detail below, with respectto FIG. 3.

In exemplary embodiments, select information from flow table 300 isaggregated and populated into an accumulating map, which is discussed infurther detail below with respect to FIG. 4. Information from theaccumulating map is transmitted by network appliance 110 acrosscommunication networks(s) 160 to network information collector 180. Inthis way, the information regarding flows processed by network appliance110 is not transmitted directly to network information collector 180,but rather a condensed and aggregated version of selected flowinformation is transmitted across the network, creating less networktraffic.

After a flow table 300 is used to populate an accumulating map, or on acertain periodic basis or activation of a condition, flow table 300 maybe discarded by network appliance 110 and a new flow table is started.Similarly, after an accumulating map 400 is received by networkinformation collector 180, or on a certain periodic basis or activationof a condition, accumulating map 400 may be discarded by networkappliance 110 and a new accumulating map is started.

Returning to FIG. 1B, network information collector 180 compriseshardware and/or software elements, including at least one processor, forreceiving data from network appliance 110 and processing it. Networkinformation collector 180 may process data received from networkappliance 110 and store the data in database(s) 130. In variousembodiments, database(s) 130 is a relational database that stores theinformation from accumulating map 400. The information can be storeddirectly into database(s) 130 or separated into columns and then storedin database(s) 130.

Database(s) 130 is further in communication with reporting engine 140.Reporting engine 140 comprises hardware and/or software elements,including at least one processor, for querying data in database(s) 130,processing it, and presenting it to user 150 via a graphical userinterface. In this way, user 150 can run any type of query on the storeddata. For example, a user can run a query requesting information on themost visited websites, or a “top talkers” report, as discussed infurther detail below.

FIG. 3 depicts an exemplary flow table 300 at network appliance 110 forflows 1 through N, with N representing any number. The flow tablecontains one or more rows of information for each flow that is processedthrough network appliance 110. Data packets transmitted and receivedbetween a single user and a single website that the user is browsing canbe parsed into multiple flows. Thus, one browsing session for a user ona website may comprise many flows. Typically a TCP flow begins with aSYN packet and ends with a FIN packet. Other methods can be used fordetermining the start and end of non-TCP flows. The attributes of eachof these flows, while they may be identical or substantially similar,are by convention stored in different rows of flow table 300 since theyare technically different flows.

In exemplary embodiments, flow table 300 may collect certain informationabout the flow, such as header information 310, network information 320,and other information 330. As would be understood by a person ofordinary skill in the art, flow table 300 can comprise fewer oradditional fields than depicted in FIG. 3. Moreover, even though headerinformation 310 is depicted as having three entries in exemplary flowtable 300, there can be fewer or additional entries for headerinformation. Similarly, there can be fewer or additional entries fornetwork information 320 and for other information 330 than the number ofentries depicted in exemplary flow table 300.

Header information 310 can comprise any type of information found in apacket header, for example, source port, destination port, sourceaddress (such as IP address), destination address, protocol. Networkinformation 320 can comprise any type of information regarding thenetwork, such as a number of bytes received or a number of bytestransmitted during that flow. Further, network information 320 cancontain information regarding other characteristics such as loss,latency, jitter, re-ordering, etc. Flow table 300 may store a sum of thenumber of packets or bytes of each characteristic, or a mathematicaloperator other than the sum, such as maximum, minimum, mean, median,average, etc. Other information 330 can comprise any other type ofinformation regarding the flow, such as traffic type or domain name(instead of address).

In an example embodiment, entry 340 of flow N is the source port for theflow, entry 345 is the destination port for the flow, and entry 350 isthe destination IP address for the flow. Entry 355 is the domain namefor the website that flow N originates from or is directed to, entry 360denotes that the flow is for a voice traffic type, and entry 365 is anapplication name (for example from deep packet inspection (DPI)). Entry370 contains the number of packets in the flow and entry 375 contains anumber of bytes in the flow.

The flow information regarding every flow is collected by the networkappliance 110 at all times, in the background. A network appliance 110could have one million flows every minute, in which case a flow tablefor one minute of data for network appliance 110 would have one millionrows. Over time, this amount of data becomes cumbersome to process,synthesize, and manipulate. Conventional systems may transport a flowtable directly to a flow information collector, or to reduce the amountof data, retain only a fraction of the records from the flow table as asample. In contrast, embodiments of the present disclosure reduce theamount of data to be processed regarding flows, with minimal informationloss, by synthesizing selected information from flow table 300 into anaccumulating map. This synthesis can occur on a periodic basis (such asevery minute, every 5 minutes, every hour, etc.), or upon the meeting ofa condition, such as number of flows recorded in the flow table 300,network status, or any other condition.

FIG. 4A depicts exemplary accumulating maps that are constructed frominformation from flow table 300. A string of information is built in ahierarchical manner from information in flow table 300. A networkadministrator can determine one or more strings of information to begathered. For example, a network administrator may determine thatinformation should be collected regarding a domain name, user computingdevice, and user computer's port number that is accessing that domain. Auser computing device can identify different computing devices utilizedby the same user (such as a laptop, smartphone, desktop, tablet,smartwatch, etc.). The user computing device can be identified in anymanner, such as by host name, MAC address, user ID, etc.

Exemplary table 400 has rows 1 through F, with F being any number, forthe hierarchical string “/domain name/computer/port” that is built fromthis information. Since the accumulating map 400 is an aggregation offlow information, F will be a much smaller value than N, the totalnumber of flows from flow table 300.

Exemplary table 450 shows data being collected for a string of source IPaddress and destination IP address combinations. Thus, informationregarding which IP addresses are communicating with each other isaccumulated. Network appliance 110 can populate an accumulating map forany number of strings of information from flow table 300. In anexemplary embodiment, network appliance 110 populates multipleaccumulating maps, each for a different string hierarchy of informationfrom flow table 300. While FIG. 4A depicts only two string hierarchies,there can be fewer or additional strings of information collected inaccumulating maps.

Row 410 in exemplary accumulating map 400 shows that during the timeinterval represented, sampledomain1 was accessed by computer1 fromport 1. All of the flows where sampledomain1 was accessed by computer1from port1 in flow table 300 are aggregated into a single row, row 410,in accumulating map 400. The network information 320 may be aggregatedfor the flows to depict a total number of bytes received and a totalnumber of packets received from sampledomain1 accessed by computer1 viaport1 during the time interval of flow table 300. In this way, a largenumber of flows may be condensed into a single row in accumulating map400.

As would be understood by a person of ordinary skill in the art, whileaccumulating map 400 depicts a total number of bytes received and atotal number of packets received (also referred to herein as a networkcharacteristic), any attribute can be collected and aggregated intoaccumulating map 400. For example, instead of a sum of bytes received,accumulating map 400 can track a maximum value, minimum value, median,percentile, or other numeric attribute for a string. Additionally, thenetwork characteristic can be other characteristics besides number ofpackets or number of bytes. Loss, latency, re-ordering, and othercharacteristics can be tracked for a string in addition to, or insteadof, packets and bytes, such as number of flows that are aggregated intothe row. For example, packet loss and packet jitter can be measured bytime stamps and serial numbers from the flow table. Additionalinformation on measurement of network characteristics can be found incommonly owned U.S. Pat. No. 9,143,455 issued on Sep. 22, 2015 andentitled “Quality of Service Using Multiple Flows”, which is herebyincorporated herein in its entirety.

Row 430 shows that the same computer (computer1) accessed the samedomain name (sampledomain1), but from a different port (port2). Thus,all of the flows in flow table 300 from port2 of computer1 tosampledomain1 are aggregated into row 430. Similarly, accumulating map400 can be populated with information from flow table 300 for any numberof domains accessed by any number of computers from any number of ports,as shown in row 440.

Flow table 300 may comprise data for one time interval whileaccumulating map 400 can comprise data for a different time interval.For example, flow table 300 can comprise data for all flows throughnetwork appliance 110 over the course of a minute, while data from 60minutes can all be aggregated into one accumulating map. Thus, if a userreturns to the same website from the same computer from the same portwithin the same hour, even though this network traffic is on a differentflow, the data can be combined with the previous flow information forthe same parameters into the accumulating map. This significantlyreduces the number of records that are maintained. All activity betweena computer and a domain from a certain port is aggregated together asone record in the accumulating map, instead of multiple records perflow. This provides information in a compact manner for furtherprocessing, while also foregoing the maintenance of all details aboutmore specific activities.

Exemplary accumulating map 450 depicts flow information for anotherstring—source IP address and destination IP address combinations. InIPv4 addressing alone, there are four billion possibilities for sourceIP addresses and four billion possibilities for destination IPaddresses. To maintain a table of all possible IP address combinationsbetween these would be an unwieldy table of information to collect.Further, most combinations for a particular network appliance 110 wouldbe zero. Thus, to maintain large volumes of data in a scalable way, theaccumulating map 450 only collects information regarding IP addressesactually used as a source or destination, instead of every possiblecombination of IP addresses.

The accumulating map 450 can be indexed in different indexingstructures, as would be understood by a person of ordinary skill in theart. For example, a hash table can be used where the key is the stringand a hash of the string is computed to find a hash bin. In that bin isa list of strings and their associated values. Furthermore, there can beadditional indexing to make operations (like finding smallest value)fast, as discussed herein. An accumulating map may comprise the contentsof the table, such as that depicted in 400 and 450, and additionally oneor more indexing structures and additional information related to thetable. In some embodiments, only the table itself from the accumulatingmap may be transmitted to network information collector 180. In otherembodiments, some or all of the additional information, such as indexinginformation, may be transmitted with the table.

The information from an accumulating map can be collected from thenetwork appliances and then stored in database(s) 130, which may be arelational database. The scheme can use raw aggregated strings andcorresponding values in columns of the database(s) 130, or separatecolumns can be used for each flow attribute of the string and itscorresponding values. For example, port, computer, and domain name canall be separate columns in a relational database, rather than stored asone column for the string.

The reporting engine 140 allows a user 150 or network administrator torun a query and generate a report from information in accumulating mapsthat was stored in database(s) 130. For example, a user 150 can querywhich websites were visited by searching “/domain/*”. A user 150 canquery the top traffic types by searching “/*/traffic type”.Multi-dimensional searches can also be run on data in database(s) 130.For example, who are they top talkers and which websites are theyvisiting? For the top destinations, who is going there? For the topwebsites, what are the top traffic types? A network administrator canconfigure the system to aggregate selected flow information basedspecifically on the most common types of queries that are run on networkdata. Further, multi-dimensional queries can be run on this aggregatedinformation, even though the data is not stored in a multi-dimensionalformat (such as a cube).

Further, by collecting flow information for a certain time interval inflow table 300 (e.g., once a minute), and aggregating selected flowinformation into one or more accumulating maps for a set time interval(e.g., once an hour) at the network appliance 110, only relevant flowinformation is gathered by network information collector 180 andmaintained in database(s) 130. This allows for efficient scalability ofa large number of network appliances in a WAN, since the amount ofinformation collected and stored is significantly reduced, compared tosimply collecting and storing all information about all flows throughevery network appliance for all time. Through an accumulating map,information can be aggregated by time, appliance, traffic type, IPaddress, website/domain, or any other attribute associated with a flow.

While the strings of an accumulating map are depicted herein withslashes, the information can be stored in an accumulating map in anyformat, such as other symbols or even no symbol at all. A string can becomposed of binary records joined together to make a string, or normalASCII text, Unicode text, or concatenations thereof. For example, row410 can be represented as “sampledomain1, computer1, port1” or in anynumber of ways. Further, instead of delimiting a string by characters,it can be delimited by links and values. Information can also be sortedlexicographically.

FIG. 4B depicts exemplary information from a row of an accumulating map.A string is composed of an attribute value 412 (such as 1.2.3.4) of afirst attribute 411 (such as source IP address), and an attribute value414 (such as 5.6.7.8) of a second attribute 413 (such as destination IPaddress). For each string of information, there is an associated networkcharacteristic 415 (such as number of bytes received) and itscorresponding network metric 416 (such as 54) and there can optionallybe a second network characteristic 417 (number of packets received) andits corresponding network metric 418 (such as 13). While two networkcharacteristics are depicted here, there can be only one networkcharacteristic or three or more network characteristics. Similarly,there can be fewer or additional attributes in a string. Thisinformation can also be stored as a binary key string 419 as depicted inthe figure.

Furthermore, while data is discussed herein as being applicable to aparticular flow, a similar mechanism can be utilized to gather data fora tunnel, instead of just a flow. For example, a string of informationcomprising “/tunnelname/application/website” can be gathered in anaccumulating map. In this way, information regarding which tunnel a flowgoes into and which application is using that tunnel can be collectedand stored. Data packets can be encapsulated into tunnel packets, and asingle string may collect information regarding each of these packets asa way of tracking tunnel performance.

In various embodiments, an accumulating map, such as map 400, can have amaximum or target number of rows or records that can be maintained.Since one purpose of the accumulating map is to reduce the amount offlow information that is collected, transmitted, and stored, it can beadvantageous to limit the size of the accumulating map. Once a definednumber of records is reached, then an eviction policy can be applied todetermine how new entries are processed. The eviction policy can betriggered upon reaching a maximum number of records, or upon reaching alower target number of records.

In one eviction policy, any new strings of flow information that are notalready in the accumulating map will simply be discarded for that timeinterval, until a new accumulating map is started for the next timeinterval.

In a second eviction policy, the strings of information that constituteoverflow are summarized into a log file, called an eviction log. Theeviction log can be post-processed and transmitted to the networkinformation collector 180 at substantially the same time as informationfrom the accumulating map. Alternatively, the eviction log may beconsulted only at a later time when further detail is required.

In a third eviction policy, when a new string needs to be added to anaccumulating map, then an existing record can be moved from theaccumulating map into an eviction log to make space for the new stringwhich is then added to the accumulating map. The determination of whichexisting record to purge from the accumulating map can be based on ametric. For example, the existing entry with the least number of bytesreceived can be evicted. In various embodiments there can also be a timeparameter for this so that new strings have a chance to aggregate andbuild up before automatically being evicted for having the lowest numberof bytes. That is, to avoid a situation where the newest entry isconstantly evicted, a time parameter can be imposed to allow for anydesired aggregation of flows for the string.

In some embodiments, to find the existing entry with the least number ofbytes to be evicted, the whole accumulating map can be scanned. In otherembodiments, the accumulating map is already indexed (such as via a hashtable) so it is already sorted and the lowest value can be easily found.

In further embodiments, information from an accumulating map can bestored in bins such as those depicted in FIG. 5A. In the exemplaryembodiment of FIG. 5A, aggregated network metric values of a networkcharacteristic are displayed, and bins are labeled with various numericranges, such as 0-10, 11-40 and 41-100. Each network metric isassociated with the bin of its numeric range. Thus strings and theircorresponding aggregated values can be placed in an indexing structurefor the accumulating map in accordance with the metric value of theircorresponding network characteristic. As a network metric increases (forexample from new flows being aggregated into the string), or as anetwork metric decreases (for example from some strings being evicted),then the entry can be moved to a different bin in accordance with itsnew numeric range. In an exemplary embodiment, the table of anaccumulating map is a first data structure, a bin is a second datastructure, and sorting operations can be conducted in a third datastructure.

Placing data from accumulating map 400 in bins allows for eviction tooccur from the lowest value bin with data. Any record can be evictedfrom the lowest value bin with data, or the lowest value bin can bescanned to find the entry with the lowest network metric for eviction.

The bins can also be arranged in powers of two to cover bigger ranges ofvalues. For example, bins can have ranges of 0-1, 2-3, 4-7, 8-15, 16-31,32-63, 64-127 and so on. In this way, the information from accumulatingmap doesn't need to be kept perfectly sorted by network metric, whichcan require a significant amount of indexing.

In another exemplary embodiment, space can be freed up in anaccumulating map by combining multiple records that have commonattributes. For example, in the accumulating map of FIG. 5B, there aretwo entries with the same domain and computer, but different portnumbers. The data from these entries can be combined by keeping thedomain and computer in the string, but removing the port numbers. Inthis way, two or more records in the accumulating map with common flowattributes can be aggregated into one record by removing the uncommonattributes from the record. The bytes received and packets received forthe new condensed record is an aggregation of the previous separaterecords. In this way, some information may be lost from the accumulatingmap (through loss of some granularity), but by least importance asdefined by the combination of attributes in the string (by removing alower level but keeping a higher level of information in the string).Alternatively, of the two entries with the same domain and computer butdifferent port numbers, the record with the lowest number of bytes maysimply be evicted from the accumulating map and added to the evictionlog. There can also be a time interval allotted to the record before itis evicted to allow flow data to be aggregated for that string beforeeviction.

In a fourth eviction policy, a batch eviction can be conducted on theaccumulating map to free up space. For example, a determination may bemade of which records are the least useful and then those are evictedfrom the accumulating map and logged in the eviction log. In anexemplary embodiment, an accumulating map may be capable of having10,000 records. A batch eviction may remove 1,000 records at a time.However, any number of records can be moved in a batch eviction process,and an accumulating map size can be set to any number of records. Abatch eviction can also remove one or more bins of information.

FIG. 6 depicts an exemplary method for building a hierarchical stringand aggregating the associated values, as discussed herein. In step 610,information about network traffic flows is collected at a networkappliance. In step 620, an attribute value of a first attribute (or flowattributes) is extracted when the flow ends, or on a periodic basis. Forexample, if a flow attribute is source IP address, then the attributevalue of the source IP address (such as 1.2.3.4) is extracted. Anattribute value of a second flow attribute can also be extracted. Therecan be any number of flow attributes extracted from flow information. Instep 630, at least one hierarchical string is built with the extractedattribute values. For example, source IP may be a part of only one, ormultiple different hierarchical strings. Network metric(s) for theassociated network characteristic(s) of the hierarchical string(s) areextracted in step 640, and the network metrics are aggregated for thedifferent flows into an accumulating map record for each hierarchicalstring in step 650. For example, a string of “/source IP/destination IP”can be built from the various source and destination IP addresscombinations with the aggregated network metrics of the networkcharacteristic of number of bytes exchanged between each source IP anddestination IP combination.

The aggregated information may be sent from each network device to thenetwork information collector 180 as discussed herein. The informationcan be transmitted as raw data, or may be subjected to processing suchas encryption, compression, or other type of processing. The networkinformation collector 180 may initiate a request for the data from eachnetwork appliance, or the network appliance may send it automatically,such as on a periodic basis after the passage of a certain amount oftime (for example, every minute, every 5 minutes, every hour, etc.).

While the method has been described in these discrete steps, varioussteps may occur in a different order, or concurrently. Further, thismethod may be practiced for each incoming flow or outgoing flow of anetwork appliance.

Thus, methods and systems for aggregated select network trafficstatistics are disclosed. Although embodiments have been described withreference to specific examples, it will be evident that variousmodifications and changes can be made to these example embodimentswithout departing from the broader spirit and scope of the presentapplication. Therefore, these and other variations upon the exemplaryembodiments are intended to be covered by the present disclosure.Accordingly, the specification and drawings are to be regarded in anillustrative rather than a restrictive sense.

What is claimed is:
 1. A method for aggregating select network trafficstatistics for each of a plurality of network appliances connected in acommunication network, the method comprising: for each flow from a firstnetwork appliance, extracting a first attribute value of a first flowattribute; for each flow from the first network appliance, extracting asecond attribute value of a second flow attribute; building at least onehierarchical string with the extracted first attribute value and theextracted second attribute value, extracting at least one network metricfor at least one network characteristic associated with the at least onehierarchical string; aggregating the at least one network metric for theat least one network characteristic over a plurality of flows to andfrom the first network appliance in the communication network;generating an accumulating map that is updated in substantially realtime, the accumulating map comprising the at least one hierarchicalstring and associated aggregated network metrics for the first flowattribute and the second flow attribute of the hierarchical string,wherein the accumulating map has a target number of entries for aspecified time period and an eviction policy determines how informationis aggregated once the accumulating map reaches its target number ofentries for the specified time period, the eviction policy determiningthat a record is aggregated into a higher level record of theaccumulating map and is evicted from the accumulating map; andtransmitting the accumulating map to a network information collector incommunication with the plurality of network appliances.
 2. The method ofclaim 1, wherein information regarding each flow to or from a givennetwork appliance is collected in a flow table.
 3. The method of claim1, wherein the first and the second flow attributes are extracted at afirst time interval.
 4. The method of claim 1, wherein the accumulatingmap is transmitted to the network information collector at a second timeinterval, the second time interval being a different amount of time thana first time interval.
 5. The method of claim 1, wherein a newaccumulating map is started at the first network appliance after theaggregated information is transmitted to the network informationcollector.
 6. The method of claim 1, wherein the hierarchical stringrepresents a subset of network traffic statistics collected for thefirst network appliance.
 7. The method of claim 1, wherein the secondattribute of the hierarchical string further defines the first attributeof the hierarchical string;
 8. The method of claim 1, wherein theaccumulating map comprises an eviction log for collected information inexcess of the target number of entries for the specified time period,the eviction log comprising a summary of strings of information inexcess of the target number of entries for the specified time period. 9.The method of claim 1, wherein the eviction policy determines that oncethe target number of entries is reached for the specified time period,any new information collected will be discarded, and not aggregatedduring that time period.
 10. The method of claim 1, wherein the evictionpolicy further determines that an evicted record is moved to an evictionlog when aggregated into a higher level record of the accumulating map.11. The method of claim 1, wherein the eviction policy determines that aportion of at least one hierarchical string of information is removedfrom the accumulating map to reduce the number of entries below amaximum number of entries for the specified time period.
 12. The methodof claim 1, wherein the eviction policy removes a predetermined numberof records from the accumulating map and moves them to an eviction log,when a maximum number of entries for the specified time period isreached.
 13. The method of claim 1, further comprising: in response to aquery regarding network traffic from a user, displaying a portion of theinformation collected from each network appliance on a graphical userinterface to the user.
 14. The method of claim 9, wherein the evictionlog is post-processed to minimize information loss.
 15. The method ofclaim 1, wherein the aggregated information is stored in bins.
 16. Themethod of claim 1, further comprising: for each flow from the firstnetwork appliance, extracting a second network metric of the first flowattribute and its corresponding value.
 17. A system for aggregatingselect network traffic statistics, comprising: a plurality of networkappliances in a communication network, each of the plurality of networkappliances configured to: collect a plurality of flow attributes fornetwork traffic through each network appliance; build at least onehierarchical string of network traffic flow attributes with an extractedfirst attribute value and an extracted second attribute value of thecollected flow attributes; extract at least one network metric for atleast one network characteristic associated with each of the at leastone hierarchical string; aggregate the at least one network metric forthe at least one network characteristic over a plurality of flows to orfrom the network appliance; generate an accumulating map that is updatedin substantially real time, the accumulating map comprising the at leastone hierarchical string and associated aggregated network metrics for afirst flow attribute and a second flow attribute of the hierarchicalstring, wherein the accumulating map has a target number of entries fora specified time period and an eviction policy determines that a recordis aggregated into a higher level record of the accumulating map and isevicted from the accumulating map when the accumulating map reaches thetarget number of entries for the specified time period; and transmit theaccumulating map to a network information collector in communicationwith each network appliance; and the network information collectorconfigured to receive information from each network appliance, andprovide the information to a user on a graphical user display.
 18. Thesystem of claim 17, wherein the second attribute of the hierarchicalstring further defines the first attribute of the hierarchical string.19. The system of claim 17, wherein each of the plurality of networkappliances further generates at least one indexing data structure forthe accumulating map.
 20. The system of claim 17, wherein the extractedfirst attribute value and the extracted second attribute value areextracted at a first time interval.
 21. The system of claim 17, whereinthe accumulating map is transmitted to the network information collectorat a second time interval, the second time interval being a differentamount of time than a first time interval at which the extracted firstattribute value and the extracted second attribute value are extracted.22. The system of claim 17, wherein the network appliance is furtherconfigured to: generate a new accumulating map, after a previousaccumulating map is transmitted to the network information collector.